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TESTING CONDITIONAL INDEPENDENCE VIA 
ROSENBLATT TRANSFORMS 

By Kyungchul Song 

University of Pennsylvania 

This paper proposes new tests of conditional independence of two 
random variables given a single-index involving an unknown finite- 
dimensional parameter. The tests employ Rosenblatt transforms and 
are shown to be distribution-free while retaining computational con- 
venience. Some results from Monte Carlo simulations are presented 
and discussed. 

1. Introduction. Suppose that Y and Z are random variables, and let 
Xg(X) be a real function of a random vector X indexed by a parameter 
9 6 C R d . The function Xg(-) is known up to 6 € O. For example, we 
may consider Xg(X) = h(X T 9) for some known function h. Suppose that an 
estimable parameter 8q£Q is given. This paper proposes a distribution-free 
method of testing conditional independence of Y and Z given Xg (X), 

(1) YALZ\Xg (X). 

When Y and Z are conditionally independent given Xg (X), it means that 
"learning the value of Z does not provide additional information about Y, 
once we know Xg (X)" [Pearl (2000), page 11]. Hence conditional indepen- 
dence is a central notion in modeling causal relations, and its importance in 
graphical modeling is widely known [e.g., Lauritzen (1996), Pearl (2000)]. In 
the literature of program evaluations, testing conditional independence of 
the observed outcome and the treatment decision given observed covariates 
can serve as testing lack of treatment effects under the assumption of strong 
ignorability [Heckman, Ichimura and Todd (1997)]. Conditional indepen- 
dence is, sometimes, a direct implication of economic theory. For example, 
in the literature of insurance, the presence of positive conditional depen- 
dence between coverage and risk is known to be a direct consequence of 
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adverse selection under information asymmetry [e.g., Chiappori and Salanie 
(2000)]. 

Testing independence for continuous variables has drawn the attention of 
many researchers. To name but a few, see Hoeffding (1948), Blum, Kiefer 
and Rosenblatt (1961), Skaug and Tj0stheim (1993), Robinson (1991), Del- 
gado and Mora (2000) and Hong and White (2005). There has also been 
a growth of interest in testing extremal dependence. See a recent paper 
by Zhang (2008) and references therein. In contrast, the literature of test- 
ing conditional independence for continuous variables appears rather recent 
and includes relatively few researches. See Linton and Gozalo (1997), Del- 
gado and Gonzalez Manteiga (2001), Angrist and Kuersteiner (2004) and 
Su and White (2008), among others. None of these tests focuses on condi- 
tional independence between continuous variables with unknown 9q an d is 
distribution-free at the same time. 

Distribution-free tests have asymptotic critical values that do not change 
as we move from one probability to another within the null hypothesis. 
Many goodness-of-fit tests that have nontrivial asymptotic power against 
y'n-converging Pitman local alternatives are not distribution-free. To deal 
with this problem, the literature either suggests the use of approximate 
critical values through bootstrap or the transformation of the test using the 
innovation martingale approach pioneered by Khmaladze (1993). 

This paper shows that for testing conditional independence, we can gener- 
ate distribution-free tests by appropriately using Rosenblatt transforms — a 
multivariate version of a probability integral transform studied by Rosen- 
blatt (1952). Based on the result, this paper proposes a bootstrap method 
that is computationally attractive. This bootstrap procedure does not re- 
quire the re-estimation of 0q f° r each bootstrap sample. This is convenient 
when the dimension of 9q is large and its estimation involves numerical op- 
timization. 

The Rosenblatt transform is closely related to the probability integral 
transform of a single-index suggested by Stute and Zhu (2005). However, the 
nature of the problem is distinguished from that of Stute and Zhu (2005). 
First, our test of conditional independence is both omnibus (when the test 
is two-sided) and distribution-free, while the single-index restriction test of 
Stute and Zhu (2005) fails to be distribution-free when it is designed to be 
omnibus. This is purely due to the nature of conditional independence as 
distinct from a single-index restriction. Second, our test contains the proba- 
bility integral transform inside functions that are potentially discontinuous, 
making it cumbersome to rely on the [/-process theory [e.g., de la Peha and 
Gine (1999)]. This paper deals with this difficulty by directly establishing the 
bracketing entropy bounds for functions involving the probability integral 
transforms. These entropy bounds can be used for other purposes. 
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This paper is organized as follows. In the next section, we introduce the 
basic assumptions and test statistics, and develop asymptotic theory. In 
Section 3, we consider a bootstrap method. Section 4 deals with the case 
where Z is discrete. In Section 5, we present and discuss the results from the 
Monte Carlo simulation study. In the Appendix, we offer the mathematical 
proofs. 

2. Main results. Suppose that we are given a random vector (Y, Z, X) 
distributed by P and a real valued function Xe (-) on H d - X which is known 
up to a parameter 6q S C R d . For brevity, let Ao(-) = Xg (-). Throughout 
this paper, we assume that Xq(X) is continuous. Define Fq(-) to be the 
distribution function of X (X) and let U = F (X (X)), F y \jj{-\U) = P{Y < 
■\U}, and Fz\u{-\U) = P{Z < -\U}. Then, the main focus of this paper is on 
testing the following hypothesis: 

(2) H :P{Y<y,Z<z\U} = F Y]u (y\U)F z]u (z\U) wp 1, 

for all (y,z) in the support of (Y, Z). The notation "wp 1" means that the 
statement holds with probability one with respect to the distribution of U. 
Certainly, this hypothesis is equivalent to (1) with probability one because 
Fq(-) is continuous. 

Throughout the paper, the norm || • ||oo represents the sup norm, || • ||, the 
Euclidean norm and || • ||p, p , the L p (P)-norm. Let B(6q,5) = {0€Q:\\9 — 
00 1 1 <£}• Define 

Y = F Y]U (Y\U) and Z = F Z]U (Z\U), 

then (Z, U) is distributed as the joint distribution of two independent uni- 
form [0,1] random variables, and so is (Y,U), if (Z,Xq(X)) and (Y,Xo(X)) 
are continuous [Rosenblatt (1952)]. The transform of (Z, Xq(X)) into (Z,U) 
is called the Rosenblatt transform, due to Rosenblatt (1952). Let fY\z,e{y\ z -> 
Ai, Ao) be the conditional density of Y given (Z, Xg(X), Ao(-X")) = [z, Ai, Ao) 
with respect to a c-finite conditional measure. We also define fz\Y,e( z \Ui Ai, Ao) 
similarly by interchanging the roles of Y and Z. 

Assumption 1. (i) (Y,Z,Xo(X)) is continuous, 
(ii) For some 5 > 0, 

(a) Xq(-), 9 G B(9q,S), is uniformly bounded and Lipschitz in 9, that 
is, for any 61,62 £ B(6q,5), 

II Xq 1 — Xq 2 Hoc < C\\6\ — #2 1| for some C > and 

(b) Xe(X) is continuous, having a density function bounded uni- 
formly over 6 £ B(6q,5). 
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(iii) For some 5 > 0, f z \Y,e(z\y, •, A ) and fY\z,e{y\z, •, Ao) are continuously 
differentiable with derivatives bounded uniformly over (y, 2, Ao, 0) G [0, l] 2 x 
RxB(e ,<5). 

Later in the paper (Section 4), we deal with the case where either Y 
or Z is discrete. The uniform boundedness condition in (ii) is innocuous, 
because by choosing a strictly increasing function $ on [0,1], we can redefine 
Ag = $ o \g. The Lipschitz continuity in 8 can be made to hold by choosing 
this $ appropriately. The absolute continuity condition in (ii) (b) is satisfied 
in particular when Xg(X) = h(X T 6) with a continuous, strictly increasing 
function h and X T 8 is continuous. 

Define 

j z (-) = zexp(- x z) and ^{-) = j z (-) - {exp(z) - 1}. 

For a class of functions (3 U (-), u £ [0, 1], consider the following null hypoth- 
esis: 

(3) H :B[(3 U (U)^(Z)^(Y)] = V(u,y,z) G [0,1] 3 . 

The lemma below establishes that under Assumption l(i), and an appropri- 
ate condition for (3 U , the null hypothesis in (2) and the null hypothesis in 

(3) are equivalent. The result relies on Lemma 1 of Bierens (1990). 

Lemma 1. Suppose that Assumption is satisfied. Furthermore, as- 
sume that the class [0,1]} is such that (3) implies the following: 

(4) B[^(Z)^(Y)\U]=0 wp 1 V(y,z) G [0,1] 2 . 
Then, the hypothesis in (2) and the hypothesis in (3) are equivalent. 

PROOF. It is easy to see that the conditional independence (2) implies 
(3). We prove the converse. First, we show that the conditional independence 
of Y and Z given U implies (2). Suppose that this conditional independence 
holds. Let y = Fy\u(v\U) and z = F z p(z\U) for brevity. Write 

P{Y<y,Z<z\U} 

= P{Y <y,Z <z,Y <y,Z < z\U} 

(5) +P{Y <y,Z<z,Y>y,Z<z\U} 

+ P{Y <y,Z<z,Y<y,Z>z\U} 

+ P{Y <y,Z<z,Y>y,Z>z\U}. 

Following Angus (1994), the second probability on the right-hand side is 
bounded by 

P{Y<y,Y>y\U}=P{Y = y,Y>y\U} 

= P{F Ylu (Y\U) = F Y]u (y\U),Y >y\U} = 0, 
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because conditional on U = u, the event in the last probability is contained 
in the event of Y lying in the interior of an interval of constancy of F Y \u(-\u). 
The conditional probability measure of this event is certainly zero. Similarly, 
the last two probabilities in (5) can also be shown to be zero. If Y < y and 
Z < z, then Y < y and Z < z. Therefore, we obtain from (5) that 

(6) P{Y <y,Z <z\U} = P{Y <y,Z <z\U}. 

Using a similar argument, we can also obtain that 

P{Y <y\U} = P{Y <y\U}=y and 

(7) 

P{Z <z\U} = P{Z <z\U} = z, 

because Y is uniformly distributed on [0, 1] and independent of U, and so 
is Z. Conditional independence of Y and Z given U implies (2) through (6) 
and (7). 

Now, we show that (3) implies conditional independence of Y and Z given 
U. Let f{ti,t 2 \u) '■ [0, l] 2 — ► [0, oo) be the conditional density of (Z,Y) given 
U = u. Through (4), (3) implies that for all (z,y) G [0, l] 2 , 

zy [ f 1 e zt ^ t2 {f(t l ,t 2 \U)-l}dt 1 dt 2 = wp 1. 
Jo Jo 

By Lemma 1 of Bierens (1990) [see also Stinchcombe and White (1998), page 
4], f(h,t 2 \U) = l{(h,t 2 ) G [0,1] 2 }, a.e., for almost every (h,t 2 ) G [0,1] 2 , 
yielding conditional independence of Y and Z given U. □ 

The condition for /3 U (-) in Lemma 1 is explained in Stinchcombe and 
White (1998). For example, the choice of (3 U (U) = 1{U < u} or (3 U (U) = 
exp(Uu) satisfies this condition [see Bierens (1990), Lemma 1, for the latter 
choice]. From now on, we assume that /3 U (-) satisfies the condition in Lemma 
1 and focus on the null hypothesis in (3). This condition for (3 U (-) is not used 
for the weak convergence theory in Theorem 1 below. 

Assumption 2. (i) (3 u (-),ue [0,1], is uniformly bounded in [0,1], and 
for each u G [0, 1],/3 U (-) is of bounded variation. 

(ii) Fy\u(v\-) and F z \u(z\-) are twice continuously differentiable with 
derivatives bounded uniformly over (z,y) G R 2 . 

Assumption 2(i) is very weak and satisfied by most functions used in the 
literature. This flexibility in choosing the class f3 u is important because the 
choice of (3 U plays a significant role in determining the asymptotic power 
properties of the test in general. Assumption 2 (ii) is analogous to Condition 
A(i) in Theorem 2.1 of Stute and Zhu (2005) or A2 of Delgado and Gonzalez 
Manteiga (2001) on page 1475. 
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Throughout this paper, we assume that the observations (Yi,Xi,Zi)f =1 
are i.i.d. from P. We also assume that the parameter 9q is identified from 
data and estimable. For example, in the literature of program evaluations, 
this assumption is satisfied because the parameter 9q constitutes the single- 
index in the propensity score. Let 9 be a consistent estimator of 0q, and 
define 

Ui = F nAi (\ § (Xi)), Zi = F mi (Zi\Uj and % = #^(1^), 
where F nAi^ = ^£?=i,;^ W*i) < M> and 
,8) %(#,,M|f^, 

where Kh{x) = K(x/h)/h, K(-) is a kernel function and h is the bandwidth 
parameter. We similarly define Fz\u,i(z\u) . As for the estimator 9, the kernel 
and the bandwidth, we assume the following. 

Assumption 3. (i) ||0 - O || = Op(n~ 1/2 ). 

(ii) (a) K is symmetric, nonnegative, twice continuously differ entiable, 
has a compact support, and /f^, K(s) = 1. 

(b) h = Cn~ s with 1/6 < s < 1/4 for some C > 0. 

When 9 is an M-estimator, the rate of convergence in (i) can be obtained 
following the procedure of Theorem 3.2.5 of van der Vaart and Wellner 
(1996). The estimation method of 6>o depends on a further specification of 
the testing environment. For example, the conditioning variable Xg(Xi) may 
originate from the nonlinear regression model, 

Wi = X 6o (Xi) + e it 

where £j satisfies E[ej|A"j] = 0. The -^/n-consistent estimation of #o i n this 
case is well known in the literature [see, e.g., van de Geer (2000)]. Assump- 
tion 3(ii)(a) is used by Stute and Zhu (2005). Unlike their procedure, the 
bandwidth condition in (b) does not require undersmoothing. 
Define the infeasible and feasible processes 

1 n 

v n (r) = — Y J l3u{Ui)^{Z i ) 1 i{Y i ) 

and 

1 n 

v n (r) = —Y,Pu{U i ) 1 i{Z i )^{Y i ). 
V n i= i 

In the following, we establish weak convergence of both processes. The main 
complication is that the condition for (3 U (Assumption 2) is too weak to resort 
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to linearization in handling the estimation error of C/j. We deal with this 
difficulty by establishing a bracketing entropy bound for functions composite 
with bounded variation functions (see Lemma Al in the Appendix). Let 
Zoo([0,l] 3 ) denote the space of real functions on [0, l] 3 that are bounded, 
and endowed with a sup norm || • defined by ||/||oo = su Pne[o,i] 3 l/( n )l- 
The notation ~* denotes weak convergence in /oo([0, l] 3 ) in the sense of 
Hoffman-Jorgensen [e.g., van der Vaart and Wellner (1996)]. Let (•,•) be 
the inner product on L2(du) x ^(cfot) defined by (f,g) = j f(u)g(u) du. 

Theorem 1. Suppose that Assumptions 1-3 hold. Then the following 
holds: 

(i) sup rg [ 0i i]3 \v n {r) — v n { r )\ — op(l), both under Hq in (3) and under Pit- 
man local alternatives P n such that for some functions aj : [0, l] 3 — > [—1, 1], 
J = 1,2, 

B Pn [~/j(Z)\Y = y,U = u}= n- l ' 2 ai (z, y, u) 



and 



E Pn [^(Y)\Z = z,U = u]= n- l ' 2 a 2 {z, y, u), 



where Ep [-\Y = y,U = u] and Ep \\Z = z, U = u] denote conditional expec- 
tations under P n . 

(ii) v n ~^v in ioo([0, l] 3 ), under Hq in (3), where v is a centered Gaussian 
process whose covariance kernel is given by 

c(n;r 2 ) = {f3 Ul , (3 U2 ) (7^ , 7^ ) (7^ , 7^ ) . 

The asymptotic representation in (i) shows an interesting fact that £ n (r) 
is asymptotically equivalent to v n { r )- Remarkably, the estimation error in 9 
does not play a role in shaping the asymptotic distribution of the process 
v n { r )- This finding is analogous to what Stute and Zhu (2005) found in the 
context of testing a single-index restriction. 

Based on the result in Theorem 1, we can construct a test statistic 

(9) T n = Tu n 

by taking a continuous functional V. For example, in the case of two sided 
tests, we may take 

(10) r K s£n= sup \j> n (r)\ or r C M^n= / v n {rf dr) 

re[0,l] 3 Vi[o,ip / 

The first example is of Kolmogorov-Smirnov-type and the second one is 
of Cramer-von Mises-type. Asymptotic unbiasedness for these tests against 
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■y/n-converging local alternatives can be established using Anderson's lemma. 
In the case of one-sided tests, we may take 



r KS z) ™ = SU P ^"( r ) or T CM i) n=([ max{z> n (r), 0} 2 dr) 
re[0,l] 3 Vi[o,i] 3 



The asymptotic properties of the tests follow from Theorem 1. Indeed, under 
(11) T n = Tv n ^ d Tv. 

This test is distribution-free, as the limiting distribution of Tv does not 
depend on the data generating process under Hq. 



3. Bootstrap tests. The tests introduced so far are distribution-free, but, 
in many cases, it is not known how to simulate the Gaussian process v. In this 
section, we suggest a wild bootstrap method in a spirit similar to Delgado 
and Gonzalez Manteiga (2001) [see also, among others, Hardle and Mammen 
(1993), Stute, Gonzalez Manteiga and Quindimil (1998)]. 

Let ({wi,b}i=i)b=i be an i.i.d. sequence of random variables that are 
bounded, independent of {Yi, Zi,Xi}, E^^) = and E(u; 2 fe ) = 1. For exam- 
ple, one can take u>ib with a two-point distribution assigning masses + 
l)/(2\/5) and {y/$ - I) / (2y/$) to the points -( v / 5-l)/2 and (Vb + l)/2. 
Let 

1 n 

< >6 (r) = -= ]T UiMUjriiZi^O®, b=l,...,B. 
v n i=l 

The bootstrap empirical process f* b (r) is similar to those proposed by Del- 
gado and Gonzalez Manteiga (2001). Given a functional T, we can define 
bootstrap test statistics T* b = Tv^ b , b = 1, . . . , B. An a-level critical value 
is approximated by c a , n ,B = inf{t : B~ x Yj b=x Y\T^ b < t} > 1 — a}, yielding 
bootstrap test l{T n > c atn) B}, where T n is as defined in (9). 

Let Fyu be the distribution of Tv and let F^* denote the conditional 
distributions of bootstrap test statistics T*. Define d(-,-) to be a distance 
metrizing weak convergence on the real line. [For an introductory exposi- 
tion about the weak convergence of bootstrap empirical processes, see Gine 
(1997)]. The weak convergence follows eventually as a consequence of the 
almost sure multiplier CLT of Ledoux and Talagrand (1988). 

Theorem 2. Suppose that the conditions of Theorem 1 hold under Hq 
in (3). Then under Hq, 



d(F^,F ru )^0 in P. 
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The wild bootstrap procedure is easy to implement. In particular, one does 
not need to re-estimate 0$ or (Zj, Yj) using each bootstrap sample. It is worth 
noting that this desirable property is made possible by our transforming the 
test into a distribution-free one. 

4. Discrete random variables. The development so far has assumed that 
Y and Z are continuous random variables. In many important applications 
of conditional independence, either Y or Z is discrete, or more often, binary. 
For example, in the literature of program evaluations, the conditional inde- 
pendence restriction involves a binary variable representing the incidence of 
treatment. 

From now on, we assume that Y is continuous and Z is discrete, taking 
values from a known, finite set Z. We introduce (Y, U) as before. Define 
p z (U) = P{Z = z\U} for z £ Z. Similarly as in Lemma 1, we can show that 
the null hypothesis in (2) is equivalent to 

H : E[p u (U){l{Z = z}- Pz (U)}^(Y)} = V(y, u, z) G [0, l] 2 x Z, 

if (Y,Xq(X)) is continuous and (5 U satisfies approximate conditions similarly 
as in Lemma 1. We substitute the following for Assumption 1. 

Assumption ID. (i) (Y,X (X)) is continuous, 
(ii) Assumption 1(h) holds for Ag, 6> G 0. 

(hi) For some 5 > 0, fY\z,e{v\ z i ">^o) is continuously differentiable with a 
derivative bounded uniformly over (y,z,Ao,0) G [0, 1] x Z x R x B(9q,S). 
(iv) For some e > 0, p z (u) G (e, 1 — e) for all (u, z) G [0, 1] x Z. 

Let p Z} i(u) be a kernel estimator of p z (u), 

' £™=i K h {Uj - u) 

and consider the following process: for (u,y,z) G [0, l] 2 x Z, 

1 ^ pv(Ui){l{Zi = z} -p z ,i{Ui)}^{Yi) 



u n (u,y,z) = —^Yl 



^1=1 \/p z ,i(Ui)-PzAUi) 2 
where Y and Ui are as defined before. 

Theorem 3. Suppose that Assumptions ID, 2 and 3 hold. Furthermore, 
the conditions for K and h used for p z ,i{') o^e the same as Assumption 5(h). 
Then under Hq, 

v n ~^T> in Zoo([0, l] 2 x Z), 
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where v is a centered Gaussian process whose covariance kernel is given by 

c i n r 2 ) = { ' /3 " 2 ^ ^ ' 7 ^ ' ^ Zl = 22 ' 
10, ifz 1 ^z 2 . 

Theorem 3 shows that we can generate distribution-free tests based on 
u n . Test statistics are constructed using an appropriate functional Y: for 
example, 

Yu n = sup \v n (u,y,z)\ 
{u,y,z)e[0,l] 2 x2 

or 

1/2 



^v n (u,y,z) d(u,y) 

When the Gaussian process v can be simulated, asymptotic critical values 
can be read from the distribution of Tv. When this is not possible or difficult, 
one may consider the following bootstrap procedure. Take lju, as in Section 
4. Define the bootstrap process 

,i ^^Mu i ){i{z l = z}-p z , i {u l )}^{%) 
v n , =1 



«=i \Jvz,i{Ui) -Pz,i{Ui) 2 



We construct the bootstrap test statistics T* b = IV* b ,b= 1, . . . ,B, using an 
appropriate functional T. 

Theorem 4. Suppose that the conditions of Theorem 3 hold. Then under 
d{F^,F rp )^0 in P. 

5. Simulation studies. 



5.1. Conditional independence between continuous variables. We sam- 
pled Xi as i.i.d. Unif [0, 1] and Zi = aXi + (1 — a)r]i, where rji ~ i.i.d. Unif [0, 1] 
and a € {0.2, 0.5}. 

We first consider the finite sample size properties of the bootstrap tests. 
For this purpose, Yj's were generated as follows: 

DGP Al: Yi = $((Xi - 0.5)/\/O2) + e u 
DGP A2: Yi = sin(5Xi) + e iz 

where £j ~ iV(0, 1) and denotes the standard normal c.d.f. All the DGPs 
allow Yi to depend on Zi, but only through Xi, and hence belong to the null 
hypothesis. The DGPs admit different types of nonlinearity in X^. 
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Table 1 

Rejection probabilities under the null hypothesis of conditional independence 
among continuous variables 



DGP 


a 




h 






Exp. 






Ind. 




1% 


5% 


10% 


1% 


5% 


10% 


Al 


0.2 


0.25 


x n" 


1/5 


0.0140 


0.0680 


0.1285 


0.0210 


0.0715 


0.1340 






n ^o 




1/5 


01 9^ 


0^40 


1 1 fi^ 


01 fi^ 


n ofii <\ 


n 1 1 fin 






1.00 


x ri~ 


■1/5 


0.0120 


0.0525 


0.1100 


0.0140 


0.0585 


0.1125 






2.00 


x n~ 


-1/5 


0.0115 


0.0520 


0.1090 


0.0125 


0.0570 


0.1025 




0.5 


0.25 


x n~ 


■1/5 


0.0225 


0.0610 


0.1180 


0.0160 


0.0695 


0.1280 






0.50 


x n" 


-1/5 


0.0140 


0.0580 


0.1090 


0.0125 


0.0540 


0.1065 






1.00 


x n" 


-1/5 


0.0105 


0.0540 


0.0965 


0.0120 


0.0455 


0.1000 






2.00 


x n~ 


■1/5 


0.0195 


0.0690 


0.1315 


0.0170 


0.0650 


0.1315 


A2 


0.2 


0.25 


x n~ 


-1/5 


0.0200 


0.0645 


0.1295 


0.0155 


0.0660 


0.1215 






0.50 


x n" 


-1/5 


0.0120 


0.0555 


0.1055 


0.0100 


0.0560 


0.1130 






1.00 


x n" 


-1/5 


0.0115 


0.0500 


0.1025 


0.0100 


0.0490 


0.1000 






2.00 


x n" 


-1/5 


0.0120 


0.0495 


0.1095 


0.0130 


0.0485 


0.1040 




0.5 


0.25 


x n~ 


-1/5 


0.0240 


0.0765 


0.1385 


0.0185 


0.0690 


0.1295 






0.50 


x n~ 


-1/5 


0.0225 


0.0640 


0.1200 


0.0135 


0.0700 


0.1155 






1.00 


x rr 


-1/5 


0.0135 


0.0635 


0.1150 


0.0165 


0.0495 


0.1035 






2.00 


x n" 


-1/5 


0.0760 


0.2185 


0.3285 


0.0225 


0.0775 


0.1465 



We focus on two types of bootstrap-based tests: one with (5 U {U) = 1{U < 
u} (denoted "Ind." in the tables) and the other with (3 U {U) = exp(Uu) 
(denoted "Exp." in the tables). Nonparametric estimations in the Rosen- 
blatt transforms were done using kernel estimation with the kernel K (u) = 
(15/16)(1 -u 2 ) 2 l{|n| < 1}. The bandwidths for % and Z< were chosen to be 
the same, being equal to h = cn" 1 ^ with c ranging in {0.25, 0.5, 1,2}. In con- 
structing Kolmogorov-Smirnov tests, we used 10 3 equal-spaced grid points 
in [0, l] 3 . The bootstrap Monte Carlo simulation number and the Monte 
Carlo simulation number for the whole procedure were set to be 2000. The 
sample size was equal to 100. 

Finite sample sizes are reported in Table 1. The rejection probabilities are 
overall stable over different choices of bandwidths for all the tests, although 
they are slightly more sensitive to the bandwidth choices in the case of higher 
correlation between Xi and Zi (corresponding to a = 0.5). 

As for the power properties of the tests, we consider the following four 
data generating processes: 

DGP Bl: y 4 = $((A A i -0.5)/v / a2) + $((^-0.5)/v / a2)+£i, 

DGP B2: Y i = <P((X i -0.5)/V02) + sm(5Z i ) + e i , 
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Table 2 





Rejection probabilities 


under the alternative hypothesis 


with nominal level 5% 








DGP Bl 




DGP B2 


CI 


h 


Exp. 


Ind. 


Exp. 


Ind. 


0.2 


0.25 x n~ 1/5 


0.7705 


0.7560 


0.9960 


0.9915 




0.50 x n- 1/5 


0.8015 


0.7795 


0.9995 


0.9960 




1.00 x n -1 / 5 


0.8250 


0.8055 


1.0000 


0.9975 




2.00 x n ' 


0.8600 


0.8500 


1.0000 


0.9975 


0.5 


0.25 x n~ 1/s 


0.4275 


0.4140 


0.9400 


0.7685 




0.50 X n 1 


0.4505 


0.4440 


0.9575 


0.8035 




1.00 X n~ 1/5 


0.4785 


0.4855 


0.9725 


0.8340 




2.00 x n~ 1/5 


0.7620 


0.7640 


0.9695 


0.8230 








DGP B3 




DGP B4 


0.2 


0.25 x n~ 1/5 


0.0895 


0.1000 


0.8945 


0.5745 




0.50 x n~ 1/5 


0.0925 


0.0945 


0.9285 


0.6225 




1.00 x n~ 1/5 


0.0830 


0.0910 


0.9405 


0.6570 




2.00 x n~ 1/s 


0.0485 


0.0725 


0.9470 


0.6850 


0.5 


0.25 x n~ 1/5 


0.0680 


0.0770 


0.7395 


0.4480 




0.50 x n~ 1/5 


0.0585 


0.0680 


0.7935 


0.4745 




1.00 x n~ 1/5 


0.0495 


0.0650 


0.8255 


0.5000 




2.00 x n~ 1/5 


0.1270 


0.0400 


0.8555 


0.4815 



DGP B3: Y l = sm(5X i ) + $((Z i -0.5)/V(L2) + e i , 

DGP B4: Yi = *((J£j - 0.5)/ Vol) x sin(5Zi) 

The results are presented in Table 2. We report the results for the nominal 
level of 5%. The sample sizes were again 100. For all the cases considered, 
increasing the correlation between Xi and Zi (changing a from 0.2 to 0.5) 
decreases the rejection probabilities. This makes sense because as Xi and Z% 
become more dependent, the DGP becomes closer to the null hypothesis. 

While the rejection probabilities under DGPs Bl and B2 are reasonably 
high, the rejection probabilities are much higher in the case of DGP B2 
than in the case of DGP Bl. In the case of DGP Bl, Yi is monotone both 
in Xi and Zi, and is linear in X\. Hence conditional on Xi, the presence 
of the term involving Zi in the regression model is harder to detect than 
in the case of DGP B2 where Yi is not monotone in Zi. The results are 
similar regardless of whether we use f3 u (U) = exp([/u) or (3 U (U) = 1{U < u} 
in constructing test statistics. However, interestingly, when the roles of Xi 
and Zi are interchanged as in DGP B3, the tests have very weak power. In 
simulation studies which are not reported here, we found that the empirical 
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power of the tests was around 75%-95% when the component involving Zi 
in DGP B3 was taken to be sin(5Zj) or cos(5Zj) and a was set to be 0.2. 
Hence, while the type of nonlinearity between Yi and Xj plays a crucial role 
for power properties, the properties also significantly hinge on how Zi is 
related to Y{. 

Under DGP B4, the rejection probabilities are reasonably high. It is also 
interesting to observe that under DGP B4, the power properties are sig- 
nificantly better for the choice of (5 U {U) = exp(Uu) than for the choice of 
f3u{U) = 1{U < u}. This result illustrates the fact that the choice of (3 U {U) 
often plays a significant role in determining the power properties of the test. 

5.2. Conditional independence with binary Zj. Tests of conditional in- 
dependence in the case of binary Zi can be used for program evaluations. 
For example, suppose Zi is the binary decision of an individual's treatment 
which depends on the single index of covariates, Xg (Xi). Then, conditional 
independence Yi _LL Zi\Xg(Xi) is a testable implication of the absence of 
treatment effects under the strong ignorability assumption. In the simula- 
tion study, we specified the index as Xg (X) = 0.5 x (#oo + doiXu + #02X21) , 
where X i = (X li ,X 2i ), X u ~ Unif [0, 1] + 0.2 and X 2i ~ Unif [0, 1] - 0.2. Here 
#01 = #02 = 1 and #00 — 0. The treatment decision Zi was modeled as 

Zi = l{Xg (Xi)>rn}, 7i~JV(0,l). 

First, we discuss size properties of the bootstrap tests based on (3 U (U) = 
exp(U u) and on (3 U {U) = \{U < u}. For a specification of the null hypothesis, 
the variable Yi was specified as 

DGP C: Y i = 2$(\ 0o (X i )/VO2) + e u £i ~X(0,l). 

For the construction of the test statistic, we first estimated 9q using the 
MLE to obtain 9. Using this estimator 9, we constructed C/j. And then 
we obtained p z ,i(Ui) and Yi using kernel estimation with the kernel K(u) = 
(15/16)(1 — ti 2 ) 2 l{|n| < 1} as before. As for taking the Kolmogorov-Smirnov 
functional, we used 20 2 equal-spaced grid points in [0, l] 2 . 

The results are presented in Table 3. The number h\ represents the band- 
width for p z ,i(Ui) and h 2 for Yi. The size properties of the tests are fairly 
good. The rejection probabilities are mostly close to the nominal level, de- 
spite the fact that the test statistics involve a multiple number of non- 
parametric estimators and an empirical probability integral transform and 
that the sample size is only 100. The performance of the tests is quite 
stable over the bandwidth choices and is good regardless of the choice of 
P U (U) = exp(Uu) or U (U) = 1{U < u}. 

Let us turn to the power properties of the tests. For this, the following 
specifications in the alternative hypothesis were used: 

DGP Dl: Yi = 0.5Xg (Xi) + K s{Zi,X u ,X 2i ) + e u e i ~JV(0,l) J 

DGP D2: Yi = 2<S>((Xe {Xi)+Ks{Z h X li ,X 2 i))/V^2) + e u e 4 ~X(0,l), 
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Table 3 

Rejection probabilities under the null hypothesis when Z is binary 



In 






h 


2 






Exp. 






Ind. 




1% 


5% 


10% 


1% 


5% 


10% 


0.25 x n~ 


• 1/6 


0.25 


X 


n~ 


-1/5 


0.0095 


0.0430 


0.0920 


0.0155 


0.0495 


0.1025 






0.50 


X 


n~ 


-1/5 


0.0085 


0.0355 


0.0875 


0.0120 


0.0500 


0.1000 






1.00 


X 


n~ 


-1/5 


0.0055 


0.0320 


0.0735 


0.0080 


0.0460 


0.0975 






2.00 


X 


n~ 


-1/5 


0.0060 


0.0345 


0.0815 


0.0075 


0.0430 


0.0930 


0.50 x n~ 


•1/6 


0.25 


X 


n~ 


-1/5 


0.0115 


0.0550 


0.1090 


0.0150 


0.0650 


0.1150 






0.50 


X 


n~ 


-1/5 


0.0110 


0.0575 


0.1130 


0.0135 


0.0620 


0.1240 






1.00 


X 


n~ 


-1/5 


0.0145 


0.0600 


0.1050 


0.0160 


0.0600 


0.1080 






2.00 


X 


n~ 


■1/5 


0.0100 


0.0525 


0.1050 


0.0170 


0.0560 


0.1150 


1.00 x n~ 


■1/6 


0.25 


X 


n~ 


-1/5 


0.0110 


0.0555 


0.1045 


0.0135 


0.0590 


0.1130 






0.50 


X 


n~ 


-1/5 


0.0130 


0.0535 


0.1090 


0.0145 


0.0575 


0.1140 






1.00 


X 


n~ 


-1/5 


0.0135 


0.0540 


0.1125 


0.0140 


0.0550 


0.1125 






2.00 


X 


n~ 


-1/5 


0.0140 


0.0540 


0.1055 


0.0140 


0.0620 


0.1095 


2.00 x 7i" 


-1/5 


0.25 


X 


n~ 


-1/5 


0.0100 


0.0455 


0.0930 


0.0100 


0.0435 


0.0935 






0.50 


X 


n~ 


-1/5 


0.0070 


0.0440 


0.0960 


0.0115 


0.0530 


0.0965 






1.00 


X 


n~ 


-1/5 


0.0090 


0.0475 


0.0990 


0.0090 


0.0475 


0.1015 






2.00 


X 


n~ 


-1/5 


0.0145 


0.0525 


0.1080 


0.0135 


0.0525 


0.1080 



where s(Zi, Xu, X21) = Z. L {1 + \Xu\ + |^2i|}- I n the example of program 
evaluations, the second term, ns(Zi, Xu, X2i), accounts for the path the 
treatment decision affects the outcome Yi after conditioning on Xg (Xi). 
This term involves Zi and the covariate vector Xi nonlinearly. The number 
k was chosen from {0.5, 1}. 

The rejection probabilities under the alternative hypothesis are presented 
in Tables 4 and 5. The rejection probabilities against the alternatives DGP 
Dl are fairly good. It is interesting to note that the rejection probabilities 
depend on the choice of bandwidths. The performance is almost the same 
for the choice of (3 U (U) = exp(Uu) or (3 U {U) = 1{U < u}. 

The numbers in Table 4 show an interesting result that the bandwidth 
choice for p z ^(Ui) is more important for the power property of the test 
than the bandwidth for Yi. When there is more smoothing in the estima- 
tion of Pz,i(Ui) within the range of bandwidths considered, the rejection 
probability improves. However, the rejection probabilities are not as sensi- 
tive to the bandwidth choices for Y{. Similar observations are made for the 
case with DGP D2, where Yi relies nonlinearly on the deviation component 
Ks(Zi,Xu,X2i). In this case, the rejection probabilities are mostly better 
when p z ,i(Ui) involves more smoothing given the range of the bandwidths. 
However, the nonlinearity has an overall effect of reducing the rejection prob- 
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Table 4 

Rejection probabilities under the alternative hypothesis (DGP Dl) when Z is binary: 

nominal size = 5% 



K = 0.5 AC = 1.0 



hi 




/l2 


Exp. 


Ind. 


Exp. 


Ind. 


0.25 x n 


1/5 


0.25 x n~ 1/5 


0.6075 


0.6955 


0.9985 


0.9985 






0.50 x n~ 1/5 


0.6090 


0.7010 


1.0000 


1.0000 






1.00 x n~ 1/5 


0.6105 


0.7020 


1.0000 


1.0000 






2.00 x n" 1/5 


0.5950 


0.6850 


1.0000 


1.0000 


0.50 x n" 


1/5 


0.25 x n~ 1/5 


0.8850 


0.9205 


0.6715 


0.7605 






0.50 x n~ 1/5 


0.9015 


0.9335 


0.6675 


0.7590 






1.00 x n~ 1/5 


0.9065 


0.9385 


0.6695 


0.7590 






2.00 x n" 1/5 


0.9055 


0.9365 


0.6685 


0.7585 


1.00 x n' 


1/5 


0.25 x n~ 1/5 


0.9545 


0.9455 


0.9530 


0.9815 






0.50 x n _1/5 


0.9690 


0.9630 


0.9535 


0.9815 






1.00 x n _1/5 


0.9790 


0.9740 


0.9540 


0.9815 






2.00 x n~ 1/5 


0.9850 


0.9850 


0.9550 


0.9810 


2.00 x n~ 


■1/5 


0.25 x n" 1/5 


0.9465 


0.9320 


1.0000 


1.0000 






0.50 x n _1/5 


0.9665 


0.9565 


1.0000 


1.0000 






1.00 x n _1/5 


0.9885 


0.9775 


1.0000 


1.0000 






2.00 x n _1/5 


0.9980 


0.9965 


1.0000 


1.0000 



abilities. It is also interesting to note that the power properties in this case 
are different between the choices of (3 U (U) = exp(Uu) and (3 U {U) = 1{U < u}. 
The choice of indicator functions yielded a test with better power properties 
than the choice of exponential functions in this set-up. 

To summarize the findings from the simulation results. First, the size 
properties of the bootstrap methods based on the distribution-free tests 
are fairly good. Second, the power properties tend to depend on the choice 
of j3 u {U) = exp(£7it) and j3 u {U) = 1{U < u} as well as on the bandwidth 
choices. Third, there are alternatives that the tests have only a trivial power. 
This finding appears to be consistent with the point made by Janssen (2000) 
that omnibus tests have nearly trivial asymptotic power against all the local 
alternatives, except for those contained in a finite-dimensional space. 

APPENDIX: PROOFS 

Throughout the proofs, the notation C denotes a positive absolute con- 
stant, assuming different values in different contexts. For a class T of measur- 
able functions, N(e, J 7 , L q (P)) and Nu(e,J r ,L q (P)) denote the covering and 
bracketing numbers of T with respect to the L g (P)-norm [see van der Vaart 
and Wellner (1996) for their definitions]. Similarly, we define N(e,J-, \\ ■ ||oo) 
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Table 5 

Rejection probabilities under the alternative hypothesis (DGP D2) when Z is binary: 

nominal size = 5% 



K = 0.5 AC = 1.0 







fl2 




Exp. 


Ind. 


Exp. 


Ind. 


0.25 x n 


1/5 


0.25 x n" 


1/5 


0.2200 


0.2865 


0.3040 


0.4175 






0.50 x n~ 


■1/5 


0.2225 


0.2895 


0.3020 


0.4220 






1.00 x n~ 


1/5 


0.2120 


0.2870 


0.3020 


0.4260 






2.00 x n~ 


■1/5 


0.2150 


0.2795 


0.2980 


0.4160 


0.50 x n" 


1/5 


0.25 x n" 


1/5 


0.3245 


0.3745 


0.4350 


0.5560 






0.50 x n" 


1/5 


0.3500 


0.4075 


0.4675 


0.5785 






1.00 x n" 


1/5 


0.3405 


0.4100 


0.4650 


0.5975 






2.00 x n" 


1/5 


0.3460 


0.4125 


0.4615 


0.5915 


1.00 x n' 


1/5 


0.25 x n" 


1/5 


0.3490 


0.3755 


0.4680 


0.5685 






0.50 x n" 


■1/S 


0.3675 


0.4135 


0.5020 


0.6000 






1.00 x n~ 


■1/S 


0.3820 


0.4420 


0.5205 


0.6320 






2.00 x n~ 


1/5 


0.3925 


0.4610 


0.5275 


0.6525 


2.00 x n~ 


■1/5 


0.25 x n~ 


1/5 


0.3275 


0.3600 


0.4750 


0.5535 






0.50 x n~ 


1/5 


0.3575 


0.3960 


0.5065 


0.5815 






1.00 x n~ 


■1/5 


0.3840 


0.4445 


0.5340 


0.6360 






2.00 x n" 


■1/5 


0.4425 


0.5110 


0.6040 


0.6995 



and Nu(e, J-, \\ ■ ||oo) to be the covering and bracketing numbers with respect 

to || • ||oo. 

A.l. Preliminary results. 

Lemma Al. Let A be a class of measurable functions such that for each 
A £ A, A(X) is continuous with a density function (under P) bounded by 
M > 0. Let T be a class of functions of bounded variation that take values 
in [-M, M] . Then for the class Q = {r o A : (r, A) £ T x A}, it is satisfied that 
for any q>l, 

logJV^Cae, Q,L q (P)) < log A(e 9 , A, || • || 0O ) + Ci/e, 

where C\ and C2 are positive constants depending only on q and M. 

PROOF. Let F x be the c.d.f. of X(X). For any Ai, A 2 G A, 
sup|F Al (Ai(x)) - F A2 (A 2 (x))| < M||Ai - Aslloo, 

X 

because the density of A(A), A £ A, is uniformly bounded. From now on, 
we identify X(x) with F\(X(x)) without loss of generality so that X(X) is 
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uniformly distributed on [0,1]. Since 7" C 7+ — 71 where T + and 71 are 
collections of uniformly bounded, monotone functions, and we can write 7+ 
and 71 as unions of increasing functions and decreasing functions, we lose 
no generality by assuming that each r G T is decreasing. Hence by the result 
of Birman and Solomjak (1967), for any q > 1 

(13) bgj\r H ( e ,r,L ff (P))<^ 

for a constant C2 > that does not depend on P. 

Choose {Ai, . . . , AjVi} such that any A G A is assigned with Xj satisfying 
||Aj - A||oo < e 9 /2, and take an integer M £ G [2e~ 9 + 1, 2e~ q + 2) and a set 
{ci, . . . , c a/ £ } such that c\ = and 

c m+ i = c m + e 9 /2, m = 1, . . . ,M e - 1, 

so that = ci < C2 < • • • < cm e -i < 1 < cj\/ e . Define 

Xj(x) = c m when Xj(x) G [c m , c m+ i), for some m G {1, 2, . . . , M £ — 1}. 

For each j\ G {1, . . . ,Ni}, let P 51 be the distribution of Xj 1 (X) under P. 

Then choose {(ifc, A^)}^^ such that any r G T is assigned with a bracket 
(T j2 ,A j2 ) satisfying |r(A) - r ia (A)| < A h {X) and / Aj 2 (X^P^ (dX) < e q . 

Now, take any g = r o A G Q and let A^ and Tj 2 be such that || Xj 1 — A||oo < 
e^/2 and |r - r 3 - 2 | < A J2 with / Aj 2 (A) 9 P )1 (dA) < el Fix these ji and j'2 and 
extend the domain of A J2 to R by setting Aj 2 (X) = for all A G R \ [0, 1]. 

Note that 

I2O) - i T j2°~ x ji)( x )\ 

(14) < I (r o A) (x) - (r o A,-, ) (x) | + | (r o A* ) (x) - (r j2 o A* ) (x) | 

< I (r o A) (x) - (r o A ix ) (s) I + ( A J2 o A jx ) (x) . 

The range of A^ is finite and ||A — Xj 1 ||oo < || A — Xj t ||oo + || Aj^ — Xj 1 ||oo < e q . 
Since r is decreasing, |(r o A)(x) — (to A Ji )(x)| is bounded by r(A J - 1 (x) — 
e q ) -T(X h (x) + e q ), or by 

r j2 (X n (x) - e q ) - r n (X h (x) + e q ) + A j2 (X n ( x ) ~ e ") + A n &h ( x ) + e<? ) • 

Write the difference "^(A^x) — e q ) — Tj 2 (Xj 1 (x) + e q ) as Ai(x) + A%{x) + 
Az(x) -\- Ai(x) , where 

Ax(x) = ^(^(x) - e 9 ) - r j2 (X h {x) - e q /2), 

Mix) = T j2 {X n {x) - e q /2) - T j2 (X n (x)), 

A 3 (x) = T j2 (X n (x)) ~ T j2 {X h {x) + e q /2) 
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and 

Ai(x) = T j2 (\ h (x) + eV2) - T h (X h (x) + e q ). 

Due to the construction of Xj 1 (x), we write A\(x) as 
M e -l 

£ {Tm(h) - r^(j 2 )} x l{c m < X h (x) < c m+ i}, 

m=l 

where r^(j 2 ) = T j2 (c m - e q ) and r^(j 2 ) = r j2 (c m - e q /2). Since r j2 is de- 
creasing, r^(j 2 ) < T^(j 2 ), and since c m+i = c m +e q /2, 

T m+l(h) = Tj 2 {c m +1 - E q ) = T j2 ( Cm ~ E q /2) = 7^(j 3 ), 771 = 1, . . . , M £ - 1. 

Hence, we conclude 
r£ e _i(j2) < • • < d(j2) = r^(j 2 ) < r£(j 2 ) = r^Os) < • < rf (j 2 ). 

Suppose that (j 2 ) = rj^ _ 1 (j 2 ). Then ^4i(x) = and the L,j(P)-norm of 
is trivially zero. Suppose that {j 2 ) > T^ Is _ 1 (j 2 ). Since Tj 2 is uniformly 
bounded, we have (j 2 ) ~~ TMs-iCj^) < C < oo for some C > 0. Define 

m =l r l U2j _T M s -lU2j 

Let p m (ji) = P{c m < A^pf) < c m+ i}. Since A juj2 (x) < 1, A] iJ2 (x) < 
Aj 1 j 2 (x) so that 

M e -i {// - \ li ■ \ 

EAj i)i2 (X)< £ [ m xPm(h)<e q /2, 

n ' n ^ 7f02)-r^ s _ 1 (j2) 

because p m (ji) < e q /2 for m £ {1, ... , M e — 1}. Thus the L,j(P)-norm of A\ 
is bounded by Ce. We can deal with the functions Aj (x) , j = 2, 3, 4, similarly 
by redefining r^(j 2 ) and r^(j 2 ). 

From (14) we can bound \g(x) — (tj 2 o A Ji )(x)| by 

+ A 2 (x) + A 3 (x) + A 4 (x) 
(15) + (A ia o A JX )(x) + A n (\ n (x) - e q ) + A ja (X h (x) + e q ) 

Now, let us compute [E{A* X , j2 {X)} q ] l l q . The L g (P)-norm of the first four 
functions is bounded by Ce, as we proved before. By the choice of Aj 2 , 

E[A q 2 (X n (X))] = J A ]2 {X) q P n (dX)<e q . 
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Let us turn to the last two terms in (15). Note that 

M e -l 

EfA^A^X) -£«)]= ^ 2 {c m -e q )Pm{ji) 

m=l 

Mr -I M=--3 

< &%(c m -e q )e q /2 = £ &%(cm)eV2 

m=l m=l 

<E[A%(\n(X))]<£ q - 

The second equality is due to our setting Aj 2 (c) =0 for c G R \ [0,1]. The 
second to the last inequality is due to the fact that p m (ji) = £ q /2 for all 
m £ {1, . . . ,M e - 2}. Similarly E[Aj 2 (A ;71 (X) + e q )\ < e q . Combining these 
results, E[{A^ j 2 (X)} q ] < C q e q , for some constant C\ > 0, yielding the result 
that 

logAr H (Cie,^,L ff (P))<log^(e«/2,A,|| • \\oo) + C 2 /e. 
With redefinitions of constants and e, this completes the proof. □ 

Given x {n) = (xj)* =1 £ H ndx , let 

1 - 

(16) G n , A (-;x (n) ) = -^l{A(x J )<-}. 

Lemma Al yields the following bracketing entropy bound by taking r = 
fiu ° G n> \. Certainly, this r is bounded variation, because G Hi \ is increasing. 

Corollary Al. Let A and M be as in Lemma Al, and for (3 U in As- 
sumption 2{i) let 

B n = {Pu(G n ,x(H-);x in) )) : (u, A, x {n) ) e [0, 1] x A x K ndx }. 

Then logJV [ . ] (C r 2 e,jB w ,L,(P)) < logiV(^,A,|| • [!«,) + Ci/e, for any q > 1, 
where C\ and C 2 are positive constants depending only on q and M . 

The following lemma is useful for establishing a bracketing entropy bound 
of a class in which conditional c.d.f. estimators realize. 

Lemma A2. We introduce three classes of functions. First, let T n be 
a sequence of classes of maps <p(-, -):R x5 n -> [0,1], such that (a) S n C 
[—s n ,s n ], s n > 1, (b) for each v €S n , </)(■, v) is bounded variation and (c) 
for each e > 0, 

(17) sup sup \<t>{y,v + rj)-<t>(y,v-T))\ <M n e 

(y,v)enxS n r?e[0,e] 
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for some sequence M n > 1. Second, let Q be a class of measurable functions 
G : ~R dx — > S n . Finally, let J% = {</>(■, G(-)) :{4>, G) G T n x £}. 
T/ien /or any probability P, and for any q > 1, 

logN^J^L^P)) 

<C + Clog(2s n M n + e) 

+ C{(2s n M n + e )i/ff e -(9+i)/9 _ io g ( e ) + bg AT N (Ce/M n , G, L q (P))} 

for some C > 0. 

Proof. Fix e > and take a partition S n = UfcLi -6(&fc) where B{b k ) is a 
set contained in an e/M n -interval centered at b k , and J e < 2s n M n /e + 1. Let 

•^n(ofc) = {</>(-, b k ):cf)£ F n }. For each o fe , take {{f k ,j, ^k,j)}f=i k) such that to 
any / G ^(bfc) corresponds (f ktj ,A k j) such that |/(y) - < A k j(y) 

and / A q Kj {y)P{dy) < e^ 1 / {2s n M n + e) . 

Given G .F n , we let fj{y,v) = J2 J k= ifk,j(y)A k (v), where A k (v) = l{v G 
and fk,j(y) is such that \(f>(y,b k ) - fk,j(y)\ < ^k,j(y) and 



J A q k] {y)P(dy) < e q+1 /(2s n M n + e). 



Since J- n {b k ) is a uniformly-bounded class of bounded variation functions, 
the smallest number of such (k, j)'s are bounded by 

n»\ t fC(2s n M n + e) 1 ^\ ^ (2s n M n \ (C{2s n M n + e) 1 ^ 

(18) Jeexp ^hv^ -(^— + 1 exp ^)7^ 



Then we bound \4>(y, v) — fj(y,v)\ by 



(19) 



^2{4>(y, v) - 4>(y, b k )}A k (v] 



k=l 



+ 



^2{</>(yih) - fkj(y)}A k {i 



k=l 



The second term is bounded by Y^ k =i A k j(y)A k (v), and the first term, by 
e, due to (17). Hence 

\(/>(y,v) - fj(y,v)\ < Aj(y,v), 
where Aj(y, v) = SfcLi A k j(y)A k (v) + e. By the Holder inequality, 

\k=l / k=l 



Hence, / Aj(y,v)P(dy,dv) is bounded by 



" E r G T 

CE / KAyWdy) + Ce« < ' + Ce« < 2Ce« 



TESTING CONDITIONAL INDEPENDENCE 21 

yielding the inequality [from (18)] 
log N^Ce^^LgiP)) 

(20) 

< C + log(2s n M n + e) + C(2s n M n + e ) 1 /?/ e («+i)/9 _ log(e) . 

Now, take (G&, Afc)^ 1 such that to any G G £/ corresponds (Gj,Aj) such 
that |G - Gj| < Aj and EA](X) < {e/M n ) q . Take (fojA*)^ such that 
for any E .F n , there exists (<f> k ,A k ) such that \(f>(y,v) - <f>k(y,v)\ < A k (y,v) 
and EA q k (Y,Gj(X)) <e q . Now, we bound \<f>(y, G(x)) - <f> k (y, Gj(x))\ by 

\Hy,G(x)) - <t>{y,Gj{x))\ + \4>{y,Gj{x)) - <f> k (y,Gj(*))\ 

< M n Aj(x) + A k (y, Gj(x)) = A j>k (y, x). 

Since BA q j k (Y,X) < Ce q , we conclude that 

logN^Ce^J^L^P)) 

< ClogN^e^L^P)) +ClogN [ . ] (e/M n ,g,L q (P)). 
Combined with (20), this yields the desired result. □ 

Fix A : K dx —¥ R in a uniformly bounded class A, and let Fq be the distri- 
bution function of Xq(X). Let F n o j and F n \i be the empirical distribution 
functions of {A (X j )}" =lj .^ and^X,)}^^, {X^ =1 being i.i.d. ~ P. 

Lemma A3. Define A n = {A G A : ||A - A ||oo < Cn^ 1 / 2 } forC>0 and 
assume that A n satisfy the conditions for A in Lemma Al and that logN(Ce, A n , 
|| • ||oo) < Ce~ VA for some C > and r\ e [0, 1). Then, 



E 



max sup sup \F n x,i(X(x)) - F (X (x))\ =0(n 1/2 ). 
l<'<iAeA„ 6R dx J 



Proof. Let every A in A be bounded in [— M, M\. It suffices to show 
that 



(21) 

E 



sup sup \F x (X(x)) - F (X (x))\=O{n~ 1/2 ) and 
AeA n xeR d x 



max sup sup \F n \i(X)-F x (\)\ 

l<*<«AeA„Ae[-M,M] 



0(n^ 2 ) 



The first statement of (21) follows, due to Xq(X) having a bounded density 
and sup AgAn ||A — Ao||oo = 0{n~ 1 / 2 ) by the definition of A n . 
As for the second statement, write 



\F nXi {X)-F x (X)\< 



1 

— £ (1{A(X,) < A} - P{A(X,) < A}) 
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For any Ai, A 2 £ A n and Ai, A 2 G [-M, M], 

E|l{Ai(Xj) < Ai} - l{A 2 (X i ) < A 2 }| < C{||A! - A 2 ||oo + |Ai - A 2 |}. 
This implies that 

log^jCe, ^„,L 2 (P)) < logiV(Ce 2 , A n x [— M, M], || ■ ||oo + | ■ I) < Ce" 2rA , 

where J^ n = {1{A(-) < A}: (A, A) G A x [-M, M]}. Since r A < 1, the second 
result of (21) follows from the maximal inequality of Pollard (1989) [see 
Theorem A. 2 of van der Vaart (1996)]. □ 

Let A n , \q,Fq, and F n< x,i be as in Lemma A3 and define 

U n , Xii = F n>x>i (X(Xi)) and U = F (X (X)) 

for A G A n . Let S\y be a subset of H dw and introduce <3? n , a class of functions 
<p : Sw R» Then we define a kernel estimator of g v {u) = E[cp(W)\U = u] : 

(22) a (u) ~ ^U^{W 3 )K h {UnX 3 -u) 
V 22 ) 9 V ,XA U ) = ^-777 s • 

The following lemma proves uniform convergence of g^^x^u) over (93, A) G 
<3? n x A n . A related result without the transform F n \ i was obtained by 
Andrews (1995). 

Lemma A4. Let A n be as in Lemma A3. Furthermore, assume the fol- 
lowing: 

(Al) logAf(C£, A n , || • Hoc) < Ce~ TA for some C > and r A G [0, 1). 
(A2) $ n has an envelope <p such that ||^||p, 2 < 00, 

sup E[\<£(Wi)\\Ui = u] < 00, 
we [0,1] 

and for some r$ G [0,2), logJV[.](e, 3> n , L 2 (P)) <e~ r *. 

(A3) g <{,(•), <p G <& n , is twice continuously differ entiable with uniformly 
bounded derivatives. 

(A4) K is symmetric, nonnegative, continuously differ entiable, has a com- 
pact support, and K(s) = 1. 

Then as h — > and n _1 / 2 /i _1 — ► with n — > 00, 

max sup |ffp,A,i(«) -0<p(«)l 
i<i<«(^,A)e$„xA„ 

< A n (u) + Oin'^hr 1 v^bgT), 

w/iere A„(u) = C(h 2 l{\u -l\>h} + hl{\u - 1| < /i}) /or some C > 0. 
Furthermore, if <3? n is uniformly bounded, 

max sup sup |<L,A,i(^) - SV>(«)| 
1 ^^ n '( V ,A)e$ 11 xA„ U 6[0,l] 

< A n (n)+0(n~ 1 / 2 / l - 1 ). 
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Proof. We consider the first statement. Without loss of generality, as- 
sume that K has a support in [—1, 1], and K(-) < M for some M > 0. By 
Lemma A3, it suffices to focus on the event sup x \F nj x,i(X(x)) — Fo(Ao(x))| < 
Mn~ 1 / 2 for large M > 0. Define 

1 n 

P<p,\,i( u ) = — ~r K h{U n ,\,j - u)(p(Wj) 

and 

1 n 

f\,i( u ) - T Yl K h{U n ,\,j ~ U). 

n — 1 . , f , . 

First, let us prove that uniformly over 1 < i < n, 

sup |PwmC") - Cln(«)fl r v'( n )l 
(45,A)e$„xA„ 

(23) 

< A n (u) + 0(n- 1 / 2 / l - 1 v ^gT), 

where £i n (u) = I^u/h)v(-i) K i v ) dv - Write P<p,\,i( u ) ~ £in(u)9<p(u) as 
1 n 

T ]T {^(c/ re , Aj -«)-^(c/,-«)M^) 



n 

3=1,3& 



1 



We deal with j4i n following the proof of Lemma 4.5 of Stute and Zhu (2005). 
Since K has support in [—1,1], A\ n is the sum of those j's such that either 
\U n ,\,j — u\ < h or \Uj — u\ < h. By Lemma A3, 

max sup \U n \ j — Uj\ = Op{n~ l l 2 ). 

l<J<n,AeA„ ' ' 

Since n~ l l 2 h~ l — > 0, Ai n is equal to 
1 n 

(24) 2 £ K^MWjHU^j-Ujyiipj-ulKCh} 

^ ' 3=l,j¥=i 

with probability approaching one, where Ay lies between (U n> \j — u)/h and 
(Uj — u)/h. Therefore, by (A4), the term in (24) is bounded by 

1 n 
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with probability approaching one. By (A2), the expectation of the above 
sum is 0{h~ l ), yielding 

(25) A ln = Op{n~ 1 ' 2 h~ 1 ). 

This leaves us to deal with A 2n which we write as 
1 n 

A 2n = J2 KhiU-j-utt^W^-g^Uj)} 



n - 



1 



n 



= (I) + (II). 

The sum (I) is a mean-zero process. Define <fr u (y,v) = K((y — u)/h)v, (p{w,u\ 
ip) = (p(w) - g<p(u), and 

l> n = {$(-,■; tp): ipE $ n } 

and 

Jin = {&*(•,£(■)) : (u, <p) G [0, 1] x 

Take J(w,u) = M{<p(w)+g^(u)} as an envelop for J Xn . Then, E| J(W,U)\ 2 < 
oo. Note that 



b Ul (y,<p(w,u;<p)) - <j) U2 (y,>p(w,u;(p))\ < C\ui - u 2 \\<p(w) + gg>(u)\/h 



and 



\(j) u (y,<p(w,u;(pi)) - 4> u (y,<p(w,u;ip 2 ))\ 

< C\(pi(w) - <p 2 (w)\ +C\g Vl (u) -g<p 2 (u)\. 

Since E\ gipi (U) - g V2 (U)\i < E|^i(W) - ip 2 (W)\ q , q>l,we conclude that 

log N^J^L^P)) 

(26) < logN(Ceh, [0, 1], | • |) + log N(Ce, $ n , L 2 {P)) 

<C-(loge + log/i) + Ce~ r *. 

Therefore, by the maximal inequality of Pollard (1989) [e.g., Theorem A. 2 
of van der Vaart (1996)], 



E 



sup 

_(«,i/j)e[o,i]x* r 



i £ K 



U 3~ u 

h 



(27) 



< -j== I Jl + logN^e^JmMP^de 
yTl — L Jo 

= 0{n- 1 ' 2 ^-\ogh). 
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Hence the uniform convergence rate for (I) is 0{n~ x l 2 hT l y '— log h). 
Let us turn to (II). For this, write (II) as 

1 n 

— - Yl {KhiUj-^g^-EiKhiUj-^g^Uj))} 

Tl 1 . , . . . 

(28) 

+ E[K h (Uj - ujg^Uj)] - £i n (u)g v (u). 

From the steps to prove (27), the first sum is uni- 
formly over u S [0, 1]. Write the last difference in (28), for u G [0, 1], as 

h Jo K \ ~)sA v ) dv ~ 9<p( u )€in(u) = g'<p(u)&n(u)h + 0(h 2 ), 

where £ 2 n(«) = /(_t/&M-i) Therefore, 

(II) < A„(u) + 0(n~ 1/2 h^^l^h). 

Combining (I) and (II), we obtain (23). 

Following the proof of (23) with <p = l, we also obtain 

(29) sup \f x>i (u) - £i»| = 0{n- l ' 2 h- l ^^h). 

(9,A,u)e$„xA„x[0,l] 

The quantity |£i n («) — l{u E [0, 1]}| is bounded by 

maxj^ K(v)dv,J° K(v) cfoj < 1/2. 
Thus Cin(w) > 1/2 for u £ [0, 1]. Write g Vt \ t i(u) - g v (u) as 

Ptp,\,i( u ){ t , x - f 7 \ f + c 7 v {Py,A,i(") 

We apply (23) and (29) to the first and second terms to obtain the first 
result of the theorem. 

As for the second result, we modify the treatment of (I) above. Note that 

sup sup \(f) u (y,v + r)) - <p u (y,v -rj)\ < Ce. 

Sri i^n 

Since K(-) is Lipschitz, it is bounded variation, and so are K{{- —u)/h). 
Since <£ n is uniformly bounded, we apply Lemma A2 (with s n equal to some 
large M > 0), 

logiV [ . ] ( e ,J lre , J L 2 (P)) < C + C{e~ 3 / 2 -log(e) +e~ r *}. 

Substituting this bound for the one in (26) and following the same arguments 
there, we obtain the wanted result. □ 
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Let A be a uniformly bounded class of real functions and fix Ao G A. 
Let Sx be the support of a random vector X, and [0, l] 2 be the support 
of a random vector (Z,Y). Let f\(z\y, Ai, Ao) be the conditional density 
of Z given Y = y, Ai(X) = Ai and Aq(A) = Ao with respect to a cr-finite 
conditional measure. Introduce 

Hx{z\y, Ai, Ao) = E[ 7z (Z)|Y = y, X(X) = \ u X (X) = A ] 

and 

fi X (z\y, Ao) = E[ 7z (Z)|y = y, A (X) = A ]. 
For each A G A, 5(A) be the support of A (A). 

Lemma A5. Suppose that there exists C > such that for each X\ G A, 
eac/i 5 > and eac/t (Ai,Ao) GiS(Ai) x 5(Ao), 

sup |/AiOb)Ai,Ao) - fx 1 {z\y,Xi,X )\ < (75. 

(z,2/,Ai)e[0,l] 2 x5(Ai) : |Ai-Ai|<6 
Then for each 5 > and eac/i Ai in A such that ||Ai — Ao||oo < ^> 

sup |/i Al (%,Ai(:r), A (a;)) - yu Ao (z|y, A (a;))| < 6C<5. 

( 2 ,!/,x)e[0,l] 2 xSx 

Proof. Choose {z,y L x) G [0, l] 2 x 5x and Ai G A with ||Ai - A ||oo < $ 
and let Ao = Xo(x) and Ai = X\(x). Let P y be the conditional measure of 
(Z,X) given (Y,Ao(A)) = (y, Ao) and E y be the conditional expectation 
under P y . Let = \{\Xj{X) -\j\< 35}, j = 0, 1. Note that B y [A ] = 1 and 

1 > B y [A x ] = P{|Ai(X) -Ai| < 35| Y = y, A (A) = A } 

> P{|A (A) - A | < S\Y = y, X (X) = A } = 1. 

Let fix ] {z\y,X j ,Xo) = ^yhz{Z)A j }/E y [A j }=B y [ lz (Z)A j },j = 0,l. Then, for 
example, 

|/i Al (z|y,Ai, A ) - fi\ {z\y,X )\ 

< \fi Xl {z\y, Ai, A ) - Aa x 0*|y, Ai, A )| + |£ai Ob, A x , A ) - H\ (z\y, A )| 

- (I) + (II). 

Let us turn to (I). By the definition of conditional expectation, 
V\ 1 (z\y,Xi,X )= / n\ 1 (z\y,X,Xo)dF Xl {X\y,X ), 



where F Al (-|y, Ao) is the conditional c.d.f. of Ai(A) given (Y,Xq(X)) 

r_Ai+35 
'Ai-3<5 



(y,A ). Because f£^g dF Xl (X\y, A ) =E y [^i] = 1 and |7 Z (Z)| < 1, wp 1, 



(I) < sup |/ Al (2r|y,Ai+v ) Ao)-/A 1 (z|y,Ai,A )|. 

i>e[-35,3<5] : Ai+u65(Ai) 
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Therefore, by the condition of the lemma, (I) < 3C5. 
Let us turn to (II) which we write as 

\V y [ lz (Z)A x ] - E y [ lz (Z)}\ = \Ey[VAl]\, 

where V = J Z (Z) — E y [y z (Z)] because E^LAi] = 1. The term (II) is bounded 
by the absolute value of 

Mi+3<5 

/ E[VAi\Y = y, Ai(X) = A, A (X) = A ] dF Xl (A|y, A ) 

JAi-35 

/•Ai+3<5 

= / B{V\Y = y,\ 1 (X) = \,\ (X)=\ }dF Xl (\\y,\o), 

JAj-35 

or by 3(7(5, similarly as before. This implies that (II) < 3C5. □ 
A. 2. Proofs of the main results. 

PROOF of Theorem 1. (i) We first prove the following three claims: 
CI. sup (eiZ)eB(e()iCn -i/ 2)xR d Y \F nAi (\ e (x)) - F (\o(x))\ = Op{n~ l l 2 ). 

C2. ^T,2=MUi)MZi) -iz{Zi)HW = Mi)- 

C3. ^ ELi Pu(Ui){ly(Yi) - -YvimMZi) - lz (Zi)} = o P (l). 
The op(l)'s in C2 and C3 are uniform in (u,y,z) G [0, l] 3 . 

Proof of CI. Let A n = {X e : 6 e B(9 , Cn" 1 / 2 )}. By Assumption l(ii)(a), 
(30) IogJ\T H (e, A n , || • |U) < \ogN { ^Ce,B{9 G ,CrT x ' 2 ), \\ ■ ||) < -Clogs, 
because B(9o,Cn^ 1 ^ 2 ) is compact. Hence CI follows by Lemma A3. 

For the proofs for C2 and C3, we assume that Ui, Zi, and Y are estimators 
using the whole sample, not leave-one-out estimators. The discrepancy due 
to this assumption is easily shown to be asymptotically negligible. 

Proof of C2. Observe that 

A^(x; 0) = sup \F Ym (y\F n § Me( x ))) ~ F Ylu (y\F (X (x)))\ 

< sup \F Ym (y\Ui) - FyptylUi)] 
yen 

+ sup \F Ylu (y\U t ) - F Ylu (y\Ui)\ 

2/6R 

< sup \F Ym (y\Ui) - F Ylu (y\Ui)\ + C^n" 1 / 2 ) 
yen 
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by CI and Assumption 2(ii). Take large M > and let 

(31) v n {u) = M(h 2 l{\u - 1| > h/2} + hl{\u - 1| < 2/i} + n' 11 " 2 ^ 1 ). 
By CI and Lemma A4, with large probability, 

(32) sup A I n4 (X i ,6)<v n (U i ) for all i= l,...,n. 

8eB(e ,Mn~ 1/2 ) 

Take large M such that y/nh M = o(l) and expand the sum in C2 as, for 
example, 



M-i z M+l n 

s=l » i=l 

= -Bin + -B2n, 

where Z\ lies between Z» and Zi and 



Bin,. = —^Y,^(Ui)exp(Z i z){Z i - ZY^iYi). 
s\\/n f— * y 

Note that l?2n = Op{y/nh M ) = op(l) by Lemma A4. We consider Bi njS . Fix 
c> 0. Given x (n) G R ndx , let Ge(-;x( n )) = G n> A (A (■); sc( n )) with A = A e in 
(16). Denote 

^in = {G e (-,x {n) ) : (s (n) ,0) G R ndx x B^o.MtT 1 / 2 )}. 

Then let <5 n be the collection of G's in C/i n such that — Go||oo < Mn -1 / 2 , 
where Go = F o Ao- Define B n = {j3 u o G : (u,G) £ [0, 1] x Q n }. By Corollary 
Al and (30), 

(33) logiV [ . ] (£,B n ,L 2 (P)) < Ce^ 1 -Cloge. 

Fix a small c > and let X n C R ndx x C/ n be the collection of (xr n -\ , G) 's 
such that G £G n and 

1 n 

(34) -Y d K h {G{x j )-u)>c>Q. 

Define to be the set, 

{4>(-,G(-);z (n) ,x (n) ,G):(z in) ,x {n) ,G) G R" x Af n }, 
where = {zj}™ =1 and 



(z,u;z (n) ,x (n) ,G) 



E? =l72 (^(G(^-)-n) 

E" =1 ^(G(x,)-u) ' 
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Then let J% C J^ n be the class of functions </>(-,(?(•)) satisfying 

(35) \H;G(-))-F zlu (-\G (-))\<v n (G (-)). 

Fix any arbitrarily small w > 0. By Assumption 3(i), Lemma A4, the fact 
that |£in(i*)| > 1/2 there, and (32), from sufficiently large n on, 

^W^/A e ~0))e^ }> i_ w 

by increasing M in the definition of sufficiently. We will now compute a 
bracketing entropy bound for J®. 

We apply Corollary Al and (30) to obtain that 

(36) log N^e^LgiP)) < -Cloge + d/e. 

Note that (ft G J7^j is uniformly bounded [under the restriction (34)] and 
cft(z,u; Zf n \,xr n \,G) is increasing in z for all u and that by (34), 

\4>(z,v + rr, z {n) , x {n) ,G) -(ft(z,v-rr, z {n) , sc( n ) ,G)\< Crj/ h 3 . 
Take q > 3 and apply Lemma A2 and (36) to obtain that 

(37) <logN { ^Ce,J%,L q {P)) 

< -Cloge-Clogh + Ch- 3 / q e~ iq+1)/q . 

Let MZ,X) = F Z]U (Z\G {X)) and 

f PtZtVtV (X,Z,Y) = z s+1 p(X)e^ z ^ z W(Z,X) - MZ,X)Y^(Y). 

Let II n = {fp tZt y jV : (P,z,y,(ft) G B n x [0, l] 2 x J^}. We take its envelope to 
be f*(Go(-)), for which we observe that 

(38) B[vl s (G (X))] = j\l s {u)du = 0{h 2s+1 ). 

Jo 

For T = {jy-.y G [0,1]}, logJVj.](e, T, L2(du)) < — Cloge, with du denoting 
the Lebesgue measure on [0,1]. Using this, (33) and (37), we conclude that 

(39) log N [ . ] (e,U n ,L 2 (P)) < ChT*l q e- { - q + 1)lq - Clog h - Cloge. 

Now, let us prove that B>i n ^ s = op(l), s = 1, . . . , M — 1. With large proba- 
bility, 

B ln , s < sup |V^(P n - P)/| + sup Iv^P/l, 
feu n /en„ 
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where P n denotes the empirical measure of (Zi,Yi,Xi)f =1 . Hence by the 
maximal inequality [Pollard (1989); see also Theorem A. 2 of van der Vaart 
(1996)] and by (38) and (39), 

SUp I V^(P n - P)/| < Ch~ 3/{2q) / e -(9+l)/(29) de + (1) = (1). 

/en„ Jo 

We conclude that Bi n>s < y^nsup^gn^ |P/| + op(l). By (35), the absolute 
value on the right-hand side is bounded by 

(40) EO^cii^oiiE^fyoi^AflTO^Mlll^i + n, 

where ; y z ^ Vn (z;u) = z sJrl e zx z v n (u) s , 

I^E[|7^ n (^;^)||E[ 7 ,(l-)l^,Ao(^)]|] 

and 

ii^EO^jz^f/oilEiT^yoi^^^xo^o^oi-E^^yoi^^o^O]!]. 

Term I is bounded by 

o(l) x (sup \E[rifc)\Zi = z,U t = u}\) = o(l) x 0(n^ 2 ) = o^ 1 / 2 ), 

under the null hypothesis or the local alternatives of the theorem. By As- 
sumption l(ih) with the aid of Lemma A5, term II is bounded by 

E[\% >Vn (Zi; Ui)\] x C\\\ e - A ||oo = o{n~ 1 ' 2 ) 

uniformly over (z,y,6) G [0,1? x £(#o,^n~ 1/2 )- Therefore, B ln , 
completing the proof. □ 

Proof of C3. With large probability, the absolute value of the sum in 
C3 is bounded by 

C n 

7=Et».%^' U ihz,v n (Zi, Ui) + P (1), 

v n i= i 

where -y y ,v n (y;u) = -y y (y - v n (u)) - -y y (y + v n (u)). Define J 2n = {ly,v„(-,-) x 
1Z:Vn {;-):(z,y)e[0,l} 2 }. Note that 

Pc/> 2 < CE[ lytVn (Y u Ui) 1ZiVn (Z u Ui)} = 0(h 3 ) - for each G J 2r , 

This implies that the finite-dimensional distributions of {^/n{P n — P)0 : (f> G 
J2n\ converge in distribution to zero. By Theorem 3.3 of Ossiander (1987), 
^/n{P n — P)(p is asymptotically tight in (J2n) because 4> G J 2n is uniformly 
bounded and 

(41) log N { . } (e, J 2n , L 2 (P))<2 log N { . } (Ce, T,L 2 (P))< -C log e. 
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Hence sup^, g j- 2n \T/n(P n — P)4>\ = op(l). We are left with y/nP(j) to analyze. 
Let l n = l\\Ui — 1| > 2h}. Under the null hypothesis or the local alternatives 
in the theorem, 

V^P<p = v^EbvJY;, Ui) lz7Vn (Zi, Ui)] 

= ^hy,v n (Yu U^E^iZ^U^Yi, Ui}l n ] 

+ ^E[ ly)Vn (Yi, Ui)B[^ Vn (Zi, Uj\Yi,Ui](l - l n )] 

= 0(V^w 2 n ) + 0(V^h 3 ) + 0(h) = o(l), w n = n-WhT 1 , 

because the expectation of l{\Ui — 1| < 2h} is 0(h) and n~ l / 2 h~ 2 + n 1 / 2 h? — ► 
as n —> oo by Assumption 3(ii)(b). Hence sup0 g j- 2n y/nY?<j) = o(l), estab- 
lishing C3. 

Now we turn to the proof of Theorem 1. Without loss of generality, let 
f3 u be monotone decreasing. First we write n (r) — v n (r) as 

1 n 

v n i= i 

1 n 

(42) +^^{/? u ([/ l )-^(C/ l )}7 z ± (^)7 y ± (^) 

= A ln + v4 2 „. 

We show that A jn = o P (l), j = 1, 2. Write 7^)7^) - 7^)7^) as 

(43) + {7^) - 7, ± (^)}{7, ± (^) " rift)} 
+ {ri(Y l )-ri(Y t )}ri(Z l ). 

By decomposing A\ n into three terms according to the decomposition in 
(43) and applying C2 and C3, we obtain that A\ n = op(l). 
As for A2n, define 

Jin = {((Ai G) " = (G,n,y,z) e&x [0, l] 3 }. 

Following the arguments used to show supj- grijj | v / n(P n — P)/| = op(l) in 
the proof of C2, we can write 

\A 2n \ < sup V^\Pf\ +o P (l). 

Similarly, as in the steps in and below (40), the leading term above is op(l) 
due to the fact that 

sup \B[ri(Y i hj(Zi)\U l = u]\ = 0(n~ 1 / 2 ) 
(z,y,u)e[0,l] 3 
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under the null hypothesis and the local alternatives and the fact that 

sup E[\(3 u (G(Xi)) - /3 u (Ui)\] 
(G,«)eg„x[o,i] 

< sup / {{3 u (u + Mn- 1/2 ) - p u (u- Mn- 1/2 )}du = 0{n~ 1/2 ), 
«e[o,i] Jo 

which follows because f3 u is increasing and bounded in [0, 1] . 

(ii) Clearly the class J = {(3 u ^fy^:(u,y,z) £ [0,1] 3 } is P-Donsker be- 
cause its element is a product of uniformly bounded monotone functions. 
The weak convergence of v n (r) immediately follows, and the weak conver- 
gence of v n {r) follows from (i). □ 

Proof of Theorem 2. Let i£ 6 (r) = £?=i c^/^h^hy (^)- 
Also let P^ be the distribution of (wj.&)™ =1 and E w be the associated expec- 
tation. It suffices to show that for each e > 0, 



(44) 



pj sup \u* b {r)-u Q n * b {r)\>e\ = op{l). 
Ve[o,i] 3 ' 



This is because the class J is P-Donsker as we saw in the proof of The- 
orem 1(h), and by the conditional multiplier uniform CLT of Ledoux and 
Talagrand (1988) [e.g., Theorem 2.9.7 in van der Vaart and Wellner (1996)], 

d(F* I/ o,,F Tu ) = o(l) a.s. 
We turn to (44). Let Si = (Ui,Zi, Y i} Ui,Zi,Yi) and 

^(O^^sO^^^Ot^^Ot^^-^^Ot^^^t^^)- 

Then note that for all r £ [0, l] 3 , 



(45) 



E, 



-^y^t.&COSi;! 
\/n f-f 



i=l 



2i 



Ti * 



1=1 



using the proof of Theorem 1. Let p„(n,r 2 ) = y ^J2i=i(S,i( r i) -^(^)) 2 - 
Observe that 

-^j^Mn) - Ur 2 )\ < wi,6te(n) - fcfa)) < ^±Mn) - 

By Hoeffding's inequality [e.g., Lemma 3.5 of van de Geer (2000), page 33], 

Ae 2 



P A -!=y] w i,b(&( r i) -iii r 2)) >ei<exp 



{V5 + l) 2 Pn(n,r 2 ) 2 
Therefore, the process v^ b (r) is sub-Gaussian with respect to p n . 
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Let us compute the covering number of [0, l] 3 with respect to p n . Let P n 
be the empirical measure of (Ui, Z{, Yi, Ui, Zi, Yi)f =1 . Observe that £(•;?") is 
bounded variation, so that for S = {£(•; r) :r G [0, l] 3 }, 

logAT H ( £) H,L 2 (P n ))<|. 

Since p£(r,rj) = ||f(-;r) -£(-;rj)||p n 2 , this implies that logJV(e, [0,l] 3 ,p n ) < 
C/e. 

Now, using Corollary 2.2.8 of van der Vaart and Wellner [(1996), page 
101], for any r G [0,1] 3 , 



(46) 



sup 

re[0,l] 3 



<E (J 



1 



i=i 



1 

-i='V.Vi tb £(Si;ro) 



+ C 



logD(e,p n )d£, 



where C is an absolute constant and D(e,p n ) is an e-packing number of 
[0, l] 3 with respect to p n . The leading term on the right-hand side vanishes 
in probability by (45). As for the second term, note that 

sup Pn{f 1^2) — >p as n — ► 00 

n,r 2 e[0,i] 3 

from the proof of Theorem 1. Therefore, we can take 5 n — > such that 
P\ sup p n (n : r2) < 6 n \ -> 1. 

ri,7-2e[0,l] 3 J 



With probability approaching one 



roc 1 ruo n , rCS n 

J ^logD(e,p n )d£ = ^JlogD(e,p n )de<C J e^de^O. 

We conclude that the expectation on the left-hand side of (46) vanishes in 
probability. We obtain (44). □ 



Proof of Theorem 3. We first show that sup rg [ ,i]2 xZ \ v n { 



op(l) both under Hq and the local alternatives in the theorem, where 

Un[r) V^h y/Pz(Ui)-p z (Ui)' 2 

By Lemma A4, sup ug [ 0j i] sup 2g2 \f>z,i(u) — Pz(u)\ < v n {u) where v n (u) is as 
defined in (31). By Assumption lD(iv), p z ,i(u) G (e/2, 1 — e/2),e > 0, with 
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probability approaching one. We introduce Gq(-) and Q n as in the proof of 
Theorem 1. Define 



4> z (u) 



J2] =1 l{z j = z}K h (G e (x J )-u) 



Y^=iK h {Ge{xj)-u) 
Let J^ 2 be the class of functions (p z (Ge(-)) with ({(xj , Zj) T }™ =1 , 6) running 



in (R dx x Z) n x B(6Q,Mn~ 1 / 2 ) and such that for some c> 0, 

1 
n 



Y,K h (Gg(x j )-u)>c>0 
i=i 

and \4> z {Gq(-)) — p z {-)\ < v n (Go(-)). By Lemma A2, for any w > 0, we can 
find large M such that 

p fe( i? n,^(^(-)))e^ 2 }>i-^ 

log Af N (e, J% 2 , L 2 {P)) < Ch- 3 ^£-^ +1 ^ q for some q > 3 and for all vr(-) G 

0<e/2< inf |vr(x)| < sup \n(x)\ < 1 - e/2 < 1 
from some sufficiently large n on. Define 

= {^(G(-), •, -JTT) : («, Z, 7T, G) G [0, 1] X 2X^ 2 X n }, 



where Pa,z(u, z, x; ir) = (3u{u){\{z = z} — ir(x)} / \/tt(x) — ir 2 (x). Then we can 
write 

1 n 

Vn{r) = —j=^l3 UtZ {Ui,Zi,Xi;p Z) i o (P^ . o \§))-fy {%) . 
v n i= i 

To show that it is asymptotically equivalent to z^J(r) uniformly over r G 
[0, l] 2 x 2, we consider the following process: 

1 n 

(47) - 7 =Y.( ) ^ X H&), (^)€[0,ljxi 



!=1 



Note that Puz(u, z, x; it) is Lipschitz in tt G (e/2, 1 — e/2) with a uniformly 
bounded coefficient. Therefore, log iVu (e, L 2 (-?*)) is bounded by 

log iV[.] (Ce, j;^ , L 2 (P) ) + log iV N (Ce, £ n , L 2 (P) ) 

(48) 

< Ch^' q e^ q+1)/q + Ce^ 1 - Cloge, 

where B n was defined prior to (33). Using this bound and proceeding with 
the proof of Theorem l(i) by replacing by 1 and B n by & n , we conclude 
that D n is asymptotically equivalent to rf. 
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Let J z ee {/Uh^OW- = «}-A(-)Wft(-)-ft(f:(l/,«) G [M] 2 }- 
Then, 

Therefore, Uzez^z is -P-Donsker because Z is a finite set. The weak con- 
vergence of £^ to P follows. The weak convergence of v n to v follows from 
its asymptotic equivalence with z/°. □ 

Proof of Theorem 4. The proof is almost the same as the proof of 
Theorem 2. We omit the details. □ 

Acknowledgments. I thank an Associated Editor and a referee for their 
comments. I am grateful to Yoon-Jae Whang and Guido Kuersteiner who 
gave me valuable advice at the initial stage of this research. All errors are 
mine. 

REFERENCES 

Andrews, D. W. K. (1995). Nonparametric kernel estimator for semiparametric models. 
Econometric Theory 11 560-595. MR1349935 

Angrist, J. D. and Kuersteiner, G. M. (2004). Semiparametric causality tests using 
the policy propensity score. NBER Working Paper 10975. 

Angus, J. E. (1994). The probability integral transform and related results. SIAM Rev. 
36 652-654. MR1306928 

Bierens, H. J. (1990). A consistent conditional moment test ol functional form. Econo- 
metnea 58 1443-1458. MR1080813 

Birman, M. S. and Solomjak, M. Z. (1967). Piecewise-polynomial approximation of 
functions of the classes Wp . Mat. Sb. (N.S.) 73 331-355. MR0217487 

Blum, J. R., Kiefer, J. and Rosenblatt, M. (1961). Distribution-free tests of indepen- 
dence based on the sample distribution function. Ann. Statist. 32 485-498. MR0125690 

Chiappori, P. -A. and Salanie, B. (2000). Testing for asymmetric information in insur- 
ance markets. J. Political Economy 108 56-78. 

de la Pena, V. H. and Cine, E. (1999). Decoupling: From Dependence to Independence. 
Springer, New York. MR1666908 

Delgado, M. A. and Gonzalez Manteiga, W. (2001). Significance testing in nonpara- 
metric regression based on the bootstrap. Ann. Statist. 29 1469-1507. MR1873339 

Delgado, M. A. and Mora, J. (2000). A nonparametric test for serial independence of 
regression errors. Biometrika 87 228-234. MR1766845 

GlNE, E. (1997). Lecture Notes on Some Aspects of the Bootstrap. Ecole de Ete de Calcul 
de Probabilities de Saint-Flour. Lecture Notes in Mathematics 1665. Springer, Berlin. 

Hardle, W. and Mammen, E. (1993). Comparing nonparametric versus parametric re- 
gression fits. Ann. Statist. 21 1926-1947. MR1245774 

Heckman, J. J., Ichimura, H. and Todd, P. (1997). Matching as an econometric eval- 
uation estimator: Evidence from evaluating a job training programme. Rev. Econom. 
Stud. 64 605-654. MR1623713 

Hoeffding, W. (1948). A nonparametric test of independence. Ann. Math. Statist. 19 
546-557. MR0029139 



36 



K. SONG 



Hong, Y. and White, H. (2005). Asymptotic distribution theory for nonparametric en- 
troy measures of serial dependence. Econometrica 73 837-901. MR2135144 

Janssen, A. (2000). Global power functions of goodness-of-fit tests. Ann. Statist. 28 239- 
253. MR1762910 

Khmaladze, E. V. (1993). Goodness of fit problem and scanning innovation martingales. 

Ann. Statist. 21 798-829. MR1232520 
Lauritzen, S. L. (1996). Graphical Models. Oxford Univ. Press, New York. MR1419991 
Ledoux, M. and Talagrand, M. (1988). Un critere sur les pertite boules dans le 

theoreme limite central. Probab. Theory Related Fields 77 29-47. MR0921817 
Linton, O. and Gozalo, P. (1997). Conditional independence restrictions: Testing and 

estimation. Discussion Paper 1140, Cowles Foundation for Research in Economics, Yale 

Univ. 

Ossiander, M. (1987). A central limit theorem under metric entropy with Li bracketing. 

Ann. Statist. 15 897-919. MR0893905 
Pearl, J. (2000). Causality: Modeling, Reasoning, and Inference. Cambridge Univ. Press, 

New York. MR1744773 
Pollard, D. (1989). A maximal inequality for sums of independent processes under a 

bracketing entropy condition. Unpublished manuscript. 
Robinson, P. M. (1991). Consistent nonparametric entropy-based testing. Rev. Econom. 

Stud. 58 437-453. MR1108130 
Rosenblatt, M. (1952). Remarks on a multivariate transform. Ann. Math. Statist. 23 

470-472. MR0049525 

Skaug, H. J. and TJ0STHEIM, D. (1993). A nonparametric test of serial independence 
based on the empirical distribution function. Biometrika 80 591-602. MR1248024 

Stinchcombe, M. B. and White, H. (1998). Consistent specification testing with nui- 
sance parameters present only under the alternative. Econometric Theory 14 295-325. 
MR1628586 

Stute, W., Gonzalez Manteiga, W. and Quindimil, M. P. (1998). Bootstrap approxi- 
mations in model checks for regression. J. Arner. Statist. Assoc. 93 141-149. MR1614600 

Stute, W. and Zhu, L. (2005). Nonparametric checks for single-index models. Ann. 
Statist. 33 1048-1083. MR2195628 

Su, L. and White, H. (2008). A nonparametric Hellinger metric test for conditional 
independence. Econometric Theory 24 829-864. MR2428851 

van de Geer, S. (2000). Empirical Processes in M-Estimation. Cambridge Univ. Press, 
New York. 

VAN der Vaart, A. (1996). New Donsker classes. Ann. Probab. 24 2128-2140. MR1415244 
van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical 

Processes. Springer, New York. MR1385671 
Zhang, Z. (2008). Quotient correlation: A sample based alternative to Pearson's correla- 
tion. Ann. Statist. 36 1007-1030. MR2396823 

Department of Economics 

University of Pennsylvania 

528 McNeil Builing 

3718 Locust Walk 

Philadelphia, Pennsylvania 19104 

USA 

E-MAIL: kysong@sas.upenn.edu 



